Support Nessie Catalog in Iceberg connector#11701
Conversation
66fed2e to
191715e
Compare
|
@findepi looks like one of the CI jobs timed out. would it be possible to restart just this one job? |
|
@nastra GitHub Actions recently introduced "Re-run failed jobs". The feature might be not yet stable though. By the way, we will request to push an empty commit for triggering CI if necessary. |
|
@ebyhr I don't have permissions unfortunately to re-run the failed job myself, that's why I closeed & reopened the PR, which triggers a full CI run again (same as pushing) |
This library seems to be compiled with Java 12, but Trino works with Java 11 |
|
@findinpath I can see the same warning in other CI runs on other branches. Here's an example from Nessie itself is also compiled with Java 11 btw |
191715e to
6beef07
Compare
|
The tests should be normal unit tests, not ITs. You can probably use Testcontainers for Nessie. |
|
It would be nice to see some tests which involve dealing with running SQL statements over the query runner to actually test how the connector works with the new catalog type. Ideal would be (at a later point) to have additionally also a product tests environment (similar to |
|
Thank you @nastra for your contribution. I am looking forward to help out in this PR to add support for Nessie in Trino Iceberg connector. |
585fa60 to
be8ec7c
Compare
|
@findinpath thanks for the review. I updated the code based on your review and pushed a new commit. Will look into doing some additional testing via SQL statements now. |
|
@nastra you can use the Glue related tests from the connector as a rough template on what kinds of tests you need to add. |
247f4ee to
9466225
Compare
There was a problem hiding this comment.
Does the client need to know the reference configured for the connector?
There was a problem hiding this comment.
yes as it needs to operate on the correct branch/tag
There was a problem hiding this comment.
I meant, does the client need to know this detail while it is being created?
Otherwise said, should the catalog deliver the branch/tag name on each call to the client?
There was a problem hiding this comment.
eventually this is what we might end up having (the catalog providing the branch/tag to the client), but I didn't want to over-complicate the current implementation. That being said, I would prefer to adjust this part of the code once the catalog knows which branch/tag it is being used with then executing a particular SQL
9466225 to
86ce1dc
Compare
|
Could you please add documentation on how to setup Nessie catalog in the Iceberg connector You can use https://github.com/trinodb/trino/pull/11772/files#diff-e1aabf1cfd8bd8aa7d1b75e70089b57413b2e620a5eebeb36cc76fd3f2ac60db as a template for getting started. |
There was a problem hiding this comment.
I meant, does the client need to know this detail while it is being created?
Otherwise said, should the catalog deliver the branch/tag name on each call to the client?
Hey @Cerebus. Currently, the primary focus is on the JDBC catalog, but let's see if @findepi has any initial comments. |
|
Hi @bitsondatadev , can you help us getting this PR reviewed approved, It's great that the Iceberg REST catalog got merged, as Nessie is an open source project with support in many query engines we would like to add support for Nessie into Trino as well. |
|
PR is rebased and ready for the review. Thanks @nastra |
|
@findinpath can you give this another look? |
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
|
@electrum, @findinpath, @ebyhr, @findepi: |
|
@electrum, @findinpath, @ebyhr, @findepi, @ajantha-bhat: |
|
Hi Trino community, It's been more than a year since this PR has been opened. All of the review comments have been addressed and it has been rebased many times. Can we get some feedback from the core team as to why this PR is not approved? The Nessie catalog is a core Iceberg catalog and is implementing the Iceberg Catalog API. |
|
@nastra Can you rebase this, then I'll merge it |
|
PR is rebased. Thanks @nastra |
|
@electrum anything else needed to get this across the finish line? |
|
I think we should re-review this PR. For instance, |
|
93 successful, 2 skipped, 1 cancelled, and 1 failing checks
It seems the failure is unrelated to the new changes and also it is in the hive connector (changes are for the iceberg connector). Closing and re-opening the PR to retrigger the build. |
|
@ebyhr: Thanks for the review. Fixed all the comments. PR is ready for review. |
ebyhr
left a comment
There was a problem hiding this comment.
Please squash commits into one and remove commit body.
| } | ||
|
|
||
| @Test(groups = {ICEBERG, PROFILE_SPECIFIC_TESTS, ICEBERG_REST, ICEBERG_JDBC}, dataProvider = "storageFormatsWithSpecVersion") | ||
| @Test(groups = {ICEBERG, PROFILE_SPECIFIC_TESTS, ICEBERG_REST, ICEBERG_JDBC, ICEBERG_NESSIE}, dataProvider = "storageFormatsWithSpecVersion") |
There was a problem hiding this comment.
Some tests are missing ICEBERG_NESSIE group.
There was a problem hiding this comment.
the idea was to just run a few tests with nessie to keep CI times low and to show that trino+spark+nessie work
Co-authored-by: Ajantha Bhat <ajanthabhat@gmail.com>
|
The failed test Re-triggering the build |
|
@ajantha-bhat Please note that you don't need to retrigger CI when the failed test is unrelated to the change. Also, we recommend pushing an empty commit instead of reopening PR when retriggering CI. |
Thanks. I am new to Trino contributions. Good to know. |
|
PR is ready. Thanks. |
This PR integrates the (Nessie catalog functionality)[https://github.com/apache/iceberg/tree/master/nessie/src/main/java/org/apache/iceberg/nessie] to the Iceberg connector. It adds the following new things:
CatalogTypecalledNESSIEIcebergNessieCatalogModulethat sets up all necessary dependencies (including a client to connect to the Nessie server)NessieConfigthat includes configuration settings required for NessieTrinoNessieCatalog+NessieIcebergTableOperationsthat implement the main behavior of the catalognessie-apprunner-maven-pluginprior to the integration-test phase)